Skip to content

Background read-only region creation#1919

Open
jmpesp wants to merge 3 commits intooxidecomputer:mainfrom
jmpesp:concurrent_read_only_clone
Open

Background read-only region creation#1919
jmpesp wants to merge 3 commits intooxidecomputer:mainfrom
jmpesp:concurrent_read_only_clone

Conversation

@jmpesp
Copy link
Copy Markdown
Contributor

@jmpesp jmpesp commented Apr 9, 2026

When the Crucible Agent is requested to create a read-only region from a remote Downstairs source, this currently blocks the worker thread as region creation is performed in the worker loop, and it cannot respond to other state changes.

This commit spawns region creation threads that the main worker thread can send requests to, and sends all read-only region creation requests there.

This builds on the previous work to separate the serialized on-disk types from the in-memory types: a Creating state is added to the in-memory type and used while this background creation is occurring.

When the Crucible Agent is requested to create a read-only region from
a remote Downstairs source, this currently blocks the worker thread as
region creation is performed in the worker loop, and it cannot respond
to other state changes.

This commit spawns region creation threads that the main worker thread
can send requests to, and sends all read-only region creation requests
there.

This builds on the previous work to separate the serialized on-disk
types from the in-memory types: a `Creating` state is added to the
in-memory type and used while this background creation is occurring.
@jmpesp jmpesp requested a review from leftwo April 9, 2026 01:10
Copy link
Copy Markdown
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here, I have some questions for you.

let log0 = log.new(o!("component" => "worker"));
let df0 = Arc::clone(&df);
std::thread::spawn(|| {
tokio::spawn(async {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going from a real thread to a tokio task, could a long running region create trip us up here? The old way was with a thread which seemed like it could go off and do whatever for an hour and the rest of the agent could continue working. Do we run any risk of that here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about all the differences between threads and tasks, but I don't think there's a risk. With worker running in a thread or with a task, the read/write region creation occurs separately from the dropshot server and datafile manipulation logic.

);
df.fail(&r.id);
break 'requested;
std::process::exit(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the agent if we fail like this? Is it going to crash and restart?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that, if we fail here we restart the whole agent. If it's a persistant failure, then we get ourselves into a crash loop?

Looking at the error though, it's a failure to send the request over the channel to one of our worker threads, and that should be a difficult situation to reach correct? And, if we do see it, a restart of the process is likely to behave differently? I just want to avoid setting ourselves up for crash loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants